197 research outputs found

    On the sub-permutations of pattern avoiding permutations

    Full text link
    There is a deep connection between permutations and trees. Certain sub-structures of permutations, called sub-permutations, bijectively map to sub-trees of binary increasing trees. This opens a powerful tool set to study enumerative and probabilistic properties of sub-permutations and to investigate the relationships between 'local' and 'global' features using the concept of pattern avoidance. First, given a pattern {\mu}, we study how the avoidance of {\mu} in a permutation {\pi} affects the presence of other patterns in the sub-permutations of {\pi}. More precisely, considering patterns of length 3, we solve instances of the following problem: given a class of permutations K and a pattern {\mu}, we ask for the number of permutations πAvn(μ)\pi \in Av_n(\mu) whose sub-permutations in K satisfy certain additional constraints on their size. Second, we study the probability for a generic pattern to be contained in a random permutation {\pi} of size n without being present in the sub-permutations of {\pi} generated by the entry 1kn1 \leq k \leq n. These theoretical results can be useful to define efficient randomized pattern-search procedures based on classical algorithms of pattern-recognition, while the general problem of pattern-search is NP-complete

    Yule-generated trees constrained by node imbalance

    Full text link
    The Yule process generates a class of binary trees which is fundamental to population genetic models and other applications in evolutionary biology. In this paper, we introduce a family of sub-classes of ranked trees, called Omega-trees, which are characterized by imbalance of internal nodes. The degree of imbalance is defined by an integer 0 <= w. For caterpillars, the extreme case of unbalanced trees, w = 0. Under models of neutral evolution, for instance the Yule model, trees with small w are unlikely to occur by chance. Indeed, imbalance can be a signature of permanent selection pressure, such as observable in the genealogies of certain pathogens. From a mathematical point of view it is interesting to observe that the space of Omega-trees maintains several statistical invariants although it is drastically reduced in size compared to the space of unconstrained Yule trees. Using generating functions, we study here some basic combinatorial properties of Omega-trees. We focus on the distribution of the number of subtrees with two leaves. We show that expectation and variance of this distribution match those for unconstrained trees already for very small values of w

    Counting, grafting and evolving binary trees

    Get PDF
    Binary trees are fundamental objects in models of evolutionary biology and population genetics. Here, we discuss some of their combinatorial and structural properties as they depend on the tree class considered. Furthermore, the process by which trees are generated determines the probability distribution in tree space. Yule trees, for instance, are generated by a pure birth process. When considered as unordered, they have neither a closed-form enumeration nor a simple probability distribution. But their ordered siblings have both. They present the object of choice when studying tree structure in the framework of evolving genealogies

    Processes determining genetic variability: mutations in sequence space and hitchhiking

    Get PDF
    Departing from the classical model of the so-called error threshold of mutating macro-molecules, I have reformulated the model in the context of diploid organisms evolving in sequence space and under conditions of a finite population size. I found - for instance - that dominance properties have a substantial impact on the details of the error threshold (Chpt. 1). I have then asked whether error thresholds can also be observed in more general fitness landscapes than the original single-peaked landscape. For smooth landscapes the answer is negative (Chpt. 2) Studying diploid organism, I also investigated the impact of recombination on the evolutionary dynamics and on the possibility for a population to reach a fitness maximum. I concluded that the recombination rate, i.e., the chromosomal distance between interacting genetic loci, has a much more important role in generating fitness-conferring allele combinations than manipulating the mutation rate (Chpt. 3). Finally, considering a two-locus model in which one locus experiences beneficial mutations and a second locus is selectively neutral, I investigated the much discussed model of genetic hitchhiking. Using diffusion theory, I predicted the impact on the level of neutral polymorphism imposed by a beneficial mutation on a neighbouring genetic locus (Chpt. 4) and compared the predictions to experimental data of observed genetic variability in the fruitfly Drosophila. This lead to an estimate on the rate and strength with which beneficial substitutions occur in natural populations (Chpt. 5).Ausgehend von dem klassischen Modell der sogenannten Fehlerschwelle mutierender Makromoleküle habe ich das Modell im Kontext von diploiden Organismen, die im Sequenzraum und unter den Bedingungen einer endlichen Populationsgröße evolvieren, neu formuliert. Ich fand zum Beispiel heraus, dass Dominanz-Eigenschaften einen wesentlichen Einfluss auf die Details der Fehlerschwelle haben (Kap. 1). Ich habe dann gefragt, ob sich Fehlerschwellen auch in allgemeineren Fitnesslandschaften, als der ursprünglichen Ein-Peak-Landschaft, zeigen. Für "weiche" Landschaften ist die Antwort negativ (Chpt. 2). An diploiden Organismen habe ich auch den Einfluss der Rekombination auf die evolutionäre Dynamik und auf die Möglichkeit einer Population, ein Fitnessmaximum zu erreichen, untersucht. Ich kam zu dem Schluss, dass die Rekombinationsrate, d.h. der chromosomale Abstand zwischen interagierenden genetischen Loci, eine viel wichtigere Rolle bei der Erzeugung von fitnessfördernden Allelkombinationen spielt, als die Manipulation der Mutationsrate (Kap. 3). Schließlich untersuchte ich in einem Zwei-Locus-Modell, in dem ein Locus vorteilhafte Mutationen erfährt und ein zweiter Locus selektiv neutral ist, das viel diskutierte Modell des "genetischen Hitchhiking". Mit Hilfe von Diffusionstheorie konnte ich die Auswirkung einer vorteilhaften Mutation auf das Niveau der neutralen Variabilität an einem benachbarten genetischen Locus vorhersagen (Kap. 4) und dann diese Ergebnisse mit experimentellen Daten beobachtbarer genetischer Variabilität bei der Fruchtfliege Drosophila vergleichen. Dies führte zu einer Abschätzung der Rate und Stärke, mit der vorteilhafte Substitutionen in natürlichen Populationen auftreten (Kap. 5)

    Exact enumeration of cherries and pitchforks in ranked trees under the coalescent model

    Full text link
    We consider exact enumerations and probabilistic properties of ranked trees when generated under the random coalescent process. Using a new approach, based on generating functions, we derive several statistics such as the exact probability of finding k cherries in a ranked tree of fixed size n. We then extend our method to consider also the number of pitchforks. We find a recursive formula to calculate the joint and conditional probabilities of cherries and pitch- forks when the size of the tree is fixed

    Decomposing the site frequency spectrum: the impact of tree topology on neutrality tests

    Full text link
    We investigate the dependence of the site frequency spectrum (SFS) on the topological structure of genealogical trees. We show that basic population genetic statistics - for instance estimators of θ\theta or neutrality tests such as Tajima's DD - can be decomposed into components of waiting times between coalescent events and of tree topology. Our results clarify the relative impact of the two components on these statistics. We provide a rigorous interpretation of positive or negative values of an important class of neutrality tests in terms of the underlying tree shape. In particular, we show that values of Tajima's DD and Fay and Wu's HH depend in a direct way on a peculiar measure of tree balance which is mostly determined by the root balance of the tree. We present a new test for selection in the same class as Fay and Wu's HH and discuss its interpretation and power. Finally, we determine the trees corresponding to extreme expected values of these neutrality tests and present formulae for these extreme values as a function of sample size and number of segregating sites.Comment: 23 pages, 8 figure

    The expected neutral frequency spectrum of linked sites

    Full text link
    We present an exact, closed expression for the expected neutral Site Frequency Spectrum for two neutral sites, 2-SFS, without recombination. This spectrum is the immediate extension of the well known single site θ/f\theta/f neutral SFS. Similar formulae are also provided for the case of the expected SFS of sites that are linked to a focal neutral mutation of known frequency. Formulae for finite samples are obtained by coalescent methods and remarkably simple expressions are derived for the SFS of a large population, which are also solutions of the multi-allelic Kolmogorov equations. Besides the general interest of these new spectra, they relate to interesting biological cases such as structural variants and introgressions. As an example, we present the expected neutral frequency spectrum of regions with a chromosomal inversion.Comment: 26 pages, 5 figure

    Genome comparison without alignment using shortest unique substrings

    Get PDF
    BACKGROUND: Sequence comparison by alignment is a fundamental tool of molecular biology. In this paper we show how a number of sequence comparison tasks, including the detection of unique genomic regions, can be accomplished efficiently without an alignment step. Our procedure for nucleotide sequence comparison is based on shortest unique substrings. These are substrings which occur only once within the sequence or set of sequences analysed and which cannot be further reduced in length without losing the property of uniqueness. Such substrings can be detected using generalized suffix trees. RESULTS: We find that the shortest unique substrings in Caenorhabditis elegans, human and mouse are no longer than 11 bp in the autosomes of these organisms. In mouse and human these unique substrings are significantly clustered in upstream regions of known genes. Moreover, the probability of finding such short unique substrings in the genomes of human or mouse by chance is extremely small. We derive an analytical expression for the null distribution of shortest unique substrings, given the GC-content of the query sequences. Furthermore, we apply our method to rapidly detect unique genomic regions in the genome of Staphylococcus aureus strain MSSA476 compared to four other staphylococcal genomes. CONCLUSION: We combine a method to rapidly search for shortest unique substrings in DNA sequences and a derivation of their null distribution. We show that unique regions in an arbitrary sample of genomes can be efficiently detected with this method. The corresponding programs shustring (SHortest Unique subSTRING) and shulen are written in C and available at
    corecore